41 research outputs found

    Node Embedding over Temporal Graphs

    Full text link
    In this work, we present a method for node embedding in temporal graphs. We propose an algorithm that learns the evolution of a temporal graph's nodes and edges over time and incorporates this dynamics in a temporal node embedding framework for different graph prediction tasks. We present a joint loss function that creates a temporal embedding of a node by learning to combine its historical temporal embeddings, such that it optimizes per given task (e.g., link prediction). The algorithm is initialized using static node embeddings, which are then aligned over the representations of a node at different time points, and eventually adapted for the given task in a joint optimization. We evaluate the effectiveness of our approach over a variety of temporal graphs for the two fundamental tasks of temporal link prediction and multi-label node classification, comparing to competitive baselines and algorithmic alternatives. Our algorithm shows performance improvements across many of the datasets and baselines and is found particularly effective for graphs that are less cohesive, with a lower clustering coefficient

    Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

    Full text link
    The ability to collect a large dataset of human preferences from text-to-image users is usually limited to companies, making such datasets inaccessible to the public. To address this issue, we create a web app that enables text-to-image users to generate images and specify their preferences. Using this web app we build Pick-a-Pic, a large, open dataset of text-to-image prompts and real users' preferences over generated images. We leverage this dataset to train a CLIP-based scoring function, PickScore, which exhibits superhuman performance on the task of predicting human preferences. Then, we test PickScore's ability to perform model evaluation and observe that it correlates better with human rankings than other automatic evaluation metrics. Therefore, we recommend using PickScore for evaluating future text-to-image generation models, and using Pick-a-Pic prompts as a more relevant dataset than MS-COCO. Finally, we demonstrate how PickScore can enhance existing text-to-image models via ranking

    AudioGen: Textually Guided Audio Generation

    Full text link
    We tackle the problem of generating audio samples conditioned on descriptive text captions. In this work, we propose AaudioGen, an auto-regressive generative model that generates audio samples conditioned on text inputs. AudioGen operates on a learnt discrete audio representation. The task of text-to-audio generation poses multiple challenges. Due to the way audio travels through a medium, differentiating ``objects'' can be a difficult task (e.g., separating multiple people simultaneously speaking). This is further complicated by real-world recording conditions (e.g., background noise, reverberation, etc.). Scarce text annotations impose another constraint, limiting the ability to scale models. Finally, modeling high-fidelity audio requires encoding audio at high sampling rate, leading to extremely long sequences. To alleviate the aforementioned challenges we propose an augmentation technique that mixes different audio samples, driving the model to internally learn to separate multiple sources. We curated 10 datasets containing different types of audio and text annotations to handle the scarcity of text-audio data points. For faster inference, we explore the use of multi-stream modeling, allowing the use of shorter sequences while maintaining a similar bitrate and perceptual quality. We apply classifier-free guidance to improve adherence to text. Comparing to the evaluated baselines, AudioGen outperforms over both objective and subjective metrics. Finally, we explore the ability of the proposed method to generate audio continuation conditionally and unconditionally. Samples: https://tinyurl.com/audiogen-text2audi

    Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

    Full text link
    We present CM3Leon (pronounced "Chameleon"), a retrieval-augmented, token-based, decoder-only multi-modal language model capable of generating and infilling both text and images. CM3Leon uses the CM3 multi-modal architecture but additionally shows the extreme benefits of scaling up and tuning on more diverse instruction-style data. It is the first multi-modal model trained with a recipe adapted from text-only language models, including a large-scale retrieval-augmented pre-training stage and a second multi-task supervised fine-tuning (SFT) stage. It is also a general-purpose model that can do both text-to-image and image-to-text generation, allowing us to introduce self-contained contrastive decoding methods that produce high-quality outputs. Extensive experiments demonstrate that this recipe is highly effective for multi-modal models. CM3Leon achieves state-of-the-art performance in text-to-image generation with 5x less training compute than comparable methods (zero-shot MS-COCO FID of 4.88). After SFT, CM3Leon can also demonstrate unprecedented levels of controllability in tasks ranging from language-guided image editing to image-controlled generation and segmentation

    Solution structure and electrostatic properties of an SH2 domain/phosphopeptide complex

    No full text
    grantor: University of TorontoSH2 domains are small (~100 amino acid) protein recognition domains found in numerous proteins involved in signal transduction which bind to sites of tyrosine phosphorylation with high affinity in a sequence-dependent manner. We have focused on the SH2 domains of phospholipase C-ã (PLC-ã), which provide a link between activated growth factor receptors via binding through its two SH2 domains and the production of the second messengers IP3 and DAG. The interaction of PLC-ã with the platelet derived growth factor receptor (PDGFR) is at sequences about Tyr 1021 of the PDGFR, and disruption of this interaction results in decreased cell growth following growth factor stimulation. Binding studies using degenerate phosphopeptide libraries suggest that this interaction involves the C-terminal (PLCC), and not N-terminal (PLCN) SH2 domain of PLC-ã. Thus we have studied this interaction involving the PLCC SH2 domain and a 12 amino acid phosphopeptide representing sequences about Tyr 1021 using heteronuclear NMR techniques. I was involved in the cloning and purification of this SH2 domain and preparation of NMR samples of this protein/peptide complex. A full structural determination was performed on this complex in collaboration with Dr. Steve Pascal. During structure determination, I defined the conformation of the phosphopeptide in this complex, as well as demonstrating protein-peptide contacts. Protein/peptide NOEs involving pTyr resonances defined a large positively-charged pocket containing four arginine residues which bound this residue. NOEs could not define contacts with the pTyr phosphate group, and we used large downfield chemical shift changes of guanidinium group resonances to do so. pH titration studies demonstrated that the pTyr phosphate group is bound in the -2 charge state with several residues held in place to facilitate pTyr binding by a complex hydrogen bonding network. A large hydrophobic cavity on the SH2 domain surface bound six residues C-terminal to pTyr, and in particular, the Ile +1 and Pro +3 residues were deeply buried. Thus a combination of NMR techniques involving NMR assignment, structure determination and pH titration studies provided significant insights into the specific binding of SH2 domains.Ph.D

    EqGNN: Equalized Node Opportunity in Graphs

    No full text
    Graph neural networks (GNNs), has been widely used for supervised learning tasks in graphs reaching state-of-the-art results. However, little work was dedicated to creating unbiased GNNs, i.e., where the classification is uncorrelated with sensitive attributes, such as race or gender. Some ignore the sensitive attributes or optimize for the criteria of statistical parity for fairness. However, it has been shown that neither approaches ensure fairness, but rather cripple the utility of the prediction task. In this work, we present a GNN framework that allows optimizing representations for the notion of Equalized Odds fairness criteria. The architecture is composed of three components: (1) a GNN classifier predicting the utility class, (2) a sampler learning the distribution of the sensitive attributes of the nodes given their labels. It generates samples fed into a (3) discriminator that discriminates between true and sampled sensitive attributes using a novel ``permutation loss'' function. Using these components, we train a model to neglect information regarding the sensitive attribute only with respect to its label. To the best of our knowledge, we are the first to optimize GNNs for the equalized odds criteria. We evaluate our classifier over several graph datasets and sensitive attributes and show our algorithm reaches state-of-the-art results

    Is Oprah Contagious? The Depth of Diffusion of Demand Shocks in a Product Network

    No full text
    Recent studies have documented that the contagion of information and behaviors in social networks is generally quite limited. We examine whether this pattern characterizes exogenous demand shocks diffusing in a product network. To this end, we analyze a unique series of demand shocks induced by mass-media book reviews on the Oprah Winfrey television show and in The New York Times. Our identification strategy is based on a difference-in-differences model estimated using two different groups as control, based on propensity-score-based matching and network proximity to a reviewed book, respectively. Our results show that the diffusion of exogenous demand shocks in the Amazon.com product network is relatively shallow, typically about three edges deep into the network, although the economic impact of this diffusion can often be significant. We link our results to recent findings in the context of diffusion in social networks and discuss managerial implications

    KNN-Diffusion: Image Generation via Large-Scale Retrieval

    Full text link
    While the availability of massive Text-Image datasets is shown to be extremely useful in training large-scale generative models (e.g. DDPMs, Transformers), their output typically depends on the quality of both the input text, as well as the training dataset. In this work, we show how large-scale retrieval methods, in particular efficient K-Nearest-Neighbors (KNN) search, can be used in order to train a model to adapt to new samples. Learning to adapt enables several new capabilities. Sifting through billions of records at inference time is extremely efficient and can alleviate the need to train or memorize an adequately large generative model. Additionally, fine-tuning trained models to new samples can be achieved by simply adding them to the table. Rare concepts, even without any presence in the training set, can be then leveraged during test time without any modification to the generative model. Our diffusion-based model trains on images only, by leveraging a joint Text-Image multi-modal metric. Compared to baseline methods, our generations achieve state of the art results both in human evaluations as well as with perceptual scores when tested on a public multimodal dataset of natural images, as well as on a collected dataset of 400 million Stickers
    corecore